Graph-based Word Sense Disambiguation of biomedical documents
نویسندگان
چکیده
MOTIVATION Word Sense Disambiguation (WSD), automatically identifying the meaning of ambiguous words in context, is an important stage of text processing. This article presents a graph-based approach to WSD in the biomedical domain. The method is unsupervised and does not require any labeled training data. It makes use of knowledge from the Unified Medical Language System (UMLS) Metathesaurus which is represented as a graph. A state-of-the-art algorithm, Personalized PageRank, is used to perform WSD. RESULTS When evaluated on the NLM-WSD dataset, the algorithm outperforms other methods that rely on the UMLS Metathesaurus alone. AVAILABILITY The WSD system is open source licensed and available from http://ixa2.si.ehu.es/ukb/. The UMLS, MetaMap program and NLM-WSD corpus are available from the National Library of Medicine https://www.nlm.nih.gov/research/umls/, http://mmtx.nlm.nih.gov and http://wsd.nlm.nih.gov. Software to convert the NLM-WSD corpus into a format that can be used by our WSD system is available from http://www.dcs.shef.ac.uk/∼marks/biomedical_wsd under open source license.
منابع مشابه
Improving Summarization of Biomedical Documents Using Word Sense Disambiguation
We describe a concept-based summarization system for biomedical documents and show that its performance can be improved using Word Sense Disambiguation. The system represents the documents as graphs formed from concepts and relations from the UMLS. A degree-based clustering algorithm is applied to these graphs to discover different themes or topics within the document. To create the graphs, the...
متن کاملSemantic Relatedness for Biomedical Word Sense Disambiguation
This paper presents a graph-based method for all-word word sense disambiguation of biomedical texts using semantic relatedness as edge weight. Semantic relatedness is derived from a term-topic co-occurrence matrix. The sense inventory is generated by the MetaMap program. Word sense disambiguation is performed on a disambiguation graph via a vertex centrality measure. The proposed method achieve...
متن کاملDALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples
Automatic interpretation of documents is hampered by the fact that language contains terms which have multiple meanings. These ambiguities can still be found when language is restricted to a particular domain, such as biomedicine. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities but are often only able to identify the meanings for a small set of ambiguous terms. DALE...
متن کاملCan multilinguality improve Biomedical Word Sense Disambiguation?
Ambiguity in the biomedical domain represents a major issue when performing Natural Language Processing tasks over the huge amount of available information in the field. For this reason, Word Sense Disambiguation is critical for achieving accurate systems able to tackle complex tasks such as information extraction, summarization or document classification. In this work we explore whether multil...
متن کاملSense-Based Biomedical Indexing and Retrieval
This paper tackles the problem of term ambiguity, especially for biomedical literature. We propose and evaluate two methods of Word Sense Disambiguation (WSD) for biomedical terms and integrate them to a sense-based document indexing and retrieval framework. Ambiguous biomedical terms in documents and queries are disambiguated using the Medical Subject Headings (MeSH) thesaurus and semantically...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 26 22 شماره
صفحات -
تاریخ انتشار 2010